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Abstract — Motivated by recent results in optical communi- 
cations, where the performance can degrade dramatically if 
the transmitted power is sufficiently increased, the channel 
capacity is characterized for various kinds of memoryless vector 
channels. It is proved that for all single-user channels, the channel 
capacity is a nondecreasing function of power, unless the channel 
model depends on the source. More generally, the capacity- 
cost function is nondecreasing under the same condition. As a 
consequence, maximizing the mutual information over all source 
distributions with a certain cost is for such channels equivalent 
to maximizing it over the larger set of source distributions with 
upperbounded cost. For multiuser channels, there are several 
scenarios. The primary channel capacity-cost function of an 
interference channel is always nondecreasing if all interferers 
transmit with identical distributions as the primary user, but not 
always if only some parameters of the interferers' distributions 
depend on the primary user. Finally, if all source distributions in 
an interference channel are optimized jointly, then the achievable 
sum-rate capacity is again nondecreasing. 

Index Terms — Capacity-cost function, channel capacity, con- 
strained capacity, mutual information, nonlinear distortion, op- 
tical communications, Shannon capacity. 



I. Introduction 

IN THE MOST cited paper in the history of information 
theory [ 1 ], Shannon in 1948 proved that with adequate cod- 
ing, reliable communication is possible over a noisy channel, 
as long as the rate does not exceed a certain threshold, called 
the channel capacity. He provided a mathematical expression 
for the channel capacity of any single-use channel, based on its 
statistical properties. The expression is given as the supremum 
over all possible source distributions of a quantity later called 
the mutual information 13, (3). The channel capacity is often 
studied as a function of a cost, such as the average transmitted 
power. More specifically, the capacity-cost function is defined 
as the supremum of the mutual information over all source 
distributions whose cost is either equal to a given constant or 
upperbounded by a constant — the convention differs between 
disciplines. In this paper, we adopt the former definition, which 
is prevailing in optical communications, and ask the question 
whether the channel capacity is a monotonically nondecreasing 
function of cost, or if it has a peak at some cost. (With the 
second definition, the question would be trivial.) The main 
contribution of this paper is that for all static single-user 
channels and some multiuser channels, the channel capacity 
can never decrease as the cost increases. As a consequence, 
the two definitions are fully equivalent for such channels. 
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Swedish Foundation for Strategic Research (SSF) under grant RE07-0026 and 
the Swedish Research Council (VR) under grant 2007-6223. E. Agrell is with 
the Dept. of Signals and Systems, Chalmers Univ. of Technology, SE-41296 
Goteborg, Sweden (e-mail: agrell@chalmers.se). 



II. Wireless and Optical Channel Models 

For linear channels with additive, signal-independent noise, 
the channel capacity is an increasing function of the cost. 
The most well-known example is the additive white Gaussian 
noise (AWGN) channel, for which the channel capacity is 
known exactly |QQ Sec. 24], (4l Ch. 9]. In recent years, the 
problem of calculating or estimating the channel capacity of 
more complicated channels has received a lot of attention (see 
surveys in O-Q). Due to the absence of exact analytical 
solutions and the computational intractability of optimizing 
over all possible source distributions, most investigations of 
the channel capacity of non-AWGN channels rely on bounding 
techniques and asymptotic analysis. 

If only noncoherent detection is available at the receiver, the 
channel capacity can be analyzed by including a magnitude 
operation at the output of a discrete-time complex AWGN 
channel. The channel capacity is in this case not known 
exactly, but it increases logarithmically with transmitted power 
as approximately half the regular AWGN channel capacity O, 
||9] Sec. 11.2]. The same behavior has been shown for the 
phase-noise channel, in which the transmitted signal is subject 
to a uniformly random phase shift before the Gaussian noise 
is added Q, iflOl . ifTTl ; indeed, according to 0, these two 
channel models are equivalent in terms of channel capacity. 

For the Rayleigh-fading channel, the channel capacity in- 
creases logarithmically with power, with just an asymptotic 
offset to the AWGN channel capacity, if the receiver has full 
channel state information lfT2l . Ifl3ll . The increase is doubly 
logarithmic if no channel state information is available 113]- 
fl5l . The results have been extended to other wireless channel 
models, including Rician fading, systems with transmitter- 
side channel state information, and multiple-antenna channels, 
see d, ED, El, 01 Sec. 4.2-4.3, 10.3, 14.5-14.7] and 
references therein. In all these cases, the channel capacity is 
an increasing function of the transmitted cost (power), which 
is consistent with the main result in this paper. 

The main motivation for this paper comes from the type 
of nonlinear distortion encountered in fiber-optical commu- 
nications |fl9l . l20l Sec. 7.2]. The impact of this nonlinear 
distortion increases dramatically with the transmitted power, to 
the extent that communication becomes virtually impossible if 
the instantaneous power is high enough 12T1 - 1251 . Il26l Ch. 9]. 
This phenomenon is well known from experiments and sim- 
ulations. Thus one might expect that the mutual information 
and channel capacity would approach zero at sufficiently high 
power. 

In most fiber-optical transmission systems, each fiber is 
shared between several users by wavelength division multi- 
plexing (WDM). From an information theory viewpoint, this 
is either a multiple-access channel, if all received signals are 



2 



Preprint, Sept. 13, 2012 



available to all receivers, or an interference channel, if the 
signals available to each receiver are different. In the former 
case, it is assumed that all receivers are physically colocated 
and that multiuser detection is applied. Although multiuser 
detection in WDM systems can improve the performance in 
terms of bit error rate fl27), fl28l and channel capacity ||29l 
significantly compared with single-user detection, multiple- 
access channels have yet received relatively little attention 
in the optical communications research. They will not be 
considered further in this paper, where WDM transmission 
will be modeled as an interference channel. The transmission 
on each wavelength in a WDM system is affected by nonlinear 
in-band distortion as well as interference from the signals 
on other wavelengths. The most important types of optical 
nonlinear distortion are self-phase modulation (SPM), four- 
wave mixing (FWM) and cross-phase modulation (XPM), 
where the two later only occur in multiuser channels. 

In optical channel modeling, we can distinguish between 
static and dynamic channel models. A static channel model 
is one where the statistics of the channel output depends 
only on the channel input, whereas the output statistics of 
a dynamic channel model varies not only with the channel 
input but also with the distribution from which this channel 
input is drawn. Most channel models, if not all, considered 
in classical information theory by Shannon and his successors 
CD Sec. 11, 23], ED, EO Ch. 2] are static, as are the wireless 
channel models cited above. However, channel models for 
optical transmission have been proposed of either kind. 

A static channel models for optical transmission in the 
presence of SPM and noise was introduced by Mecozzi 12"T1 . 
and several variants and extensions thereof have been proposed 
G21-E1, pp. 157, 225-226]. The channel capacity of 
these models was analyzed in [231 . 11321 , fl35l , and in all cases, 
it was found to be a monotonic function of the transmitted 
power. Even if a suboptimal constant-intensity modulation 
format is used, the mutual information of the SPM channel 
is monotonic 11351 . 11361 . These results agree well with the 
classical results for constant-intensity modulation over AWGN 
E), 137). 

Splett et al. modeled the interference from FWM in a WDM 
system as an AWGN component, under some conditions on 
the noise and dispersion, in what might have been the first 
study ever of the channel capacity of an optical transmission 
system J38). The model is dynamic, as the variance of this 
AWGN depends nonlinearly on the transmitted power, which 
is assumed equal on all wavelengths. Similar dynamic FWM 
models have been rediscovered, modified, and further analyzed 
in 1391 - 1441 . Due to the source-dependent noise, their channel 
capacities have the general behavior shown in Fig. |8j As the 
average power (or signal-to-noise ratio) increases, the channel 
capacity increases towards a peak, and then decreases again as 
the power is further increased. Other dynamic channel models, 
with similar nonmonotonic channel capacities, were presented 
in fl42-i7). 

A continuous-time channel model for XPM was presented 
by Mitra and Stark l48l . Although no discrete-time XPM 
model was obtained, they showed that the channel capacity 
of the XPM channel model is lowerbounded by a dynamic 



AWGN channel, and that this lower bound is nonmonotonic. 
They further conjectured that the true channel capacity would 
have a similar nonmonotonic behavior as its lower bound. 
Many variants of the Mitra-Stark lower bound have been 
presented in recent years, often along with the conjecture 
that the true channel capacity is also nonmonotonic l40l . 
||4"91 - ||53"1 . However, the work by Turitsyn et al. 1321 shows 
that the Gaussian lower bound for SPM, analogous to the 
bound proposed in ll48l . is very far from the true channel 
capacity in the nonlinear regime (high power), and that the 
channel capacity in fact grows logarithmically with power 
under certain conditions. Thus, it is yet an open question to 
which extent the bound in PHI represents the actual channel 
capacity (6), J54]. 

Another type of lower bound on channel capacity is ob- 
tained by fixing the source distribution and calculating the 
mutual information Q, 11361 . Il55l - ll57l or by optimizing the 
mutual information over a subset of all possible source distri- 
butions Q, ED, E), ED, 03- All these lower bounds con- 
sistently show a nonmonotonic behavior, decreasing towards 
zero after a peak at a finite power, and the conjecture that the 
channel capacity would have a similar nonmonotonic behavior 
as its lower bounds is often repeated. The purpose of the 
present paper is to partially settle this conjecture, by proving 
and disproving it in certain single- and multiuser scenarios. 
Contrary to most earlier works, we will not elaborate on any 
specific optical channel model, although a few will be included 
as examples, but rather try to characterize the behavior of the 
channel capacity for some general classes of channel models. 

III. Channel Capacity, Constrained Capacity, and 
Mutual Information 

Let X and Y be real, n-dimensional vectors, representing 
the input and output, resp., of a discrete-time memory less 
communication channel. The joint distribution fx,Y( x > V) can 
be factorized as fx,Y(x,y) = fx{x)f Y \ x {y\x), where f x 
represents the source and fy\x represents the channel. As 
mentioned in Sec. HU the channel model can be either static, 
if fY\x{y\ x ) depends only on x and y regardless of fx, or 
dynamic, if fY\x(y\ x ) changes depending on which source 
distribution fx it is combined with. 

With every source vector x is associated a certain cost 
b(x) > 0. The cost of a random source, denoted by b(X), 
is defined as E[6(a;)], where the expectation is taken over all 
source vectors X = x. We assume that the cost function 
is continuous and unbounded, in the sense that there exist 
distributions with any cost j3 > 0. The most common cost 
function, and the only one that will be exemplified in this 
paper, is the (average) transmitted power P = E [||X|| 2 ]. 

We denote the mutual information between X and Y with 
I(X;Y), while I(X;Y\Z) denotes a conditional mutual 
information. The entropy and conditional entropy are denoted 
by H(X) and H(X\Z), resp., and the differential entropy 
and conditional differential entropy are denoted by h(X) and 
h(X\Z), resp. 

By applying a channel code to blocks of source vectors X, 
these can be communicated over the channel at arbitrarily low 
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error probability, provided that the rate, in bits per channel use, 
is sufficiently low. Such a rate is called an achievable rate, and 
the supremum of all achievable rates, over all possible codes 
and block lengths, is defined as the operational channel ca- 
pacity. For point-to-point channels, Shannon's channel coding 
theorem 0] Sec. 13, 23], Sec. 7.7, 9.1] states that the op- 
erational channel capacity is equal to the information channel 
capacity, which is defined as the supremum of the mutual 
information I(X;Y) between the channel input and output, 
where the supremum is taken over all source distributions 
fx- The capacity-achieving distribution may be continuous 
or discrete lfl6l . |6Q| . 

In this paper, we study the channel capacity C as a function 
of a cost (3. This capacity-cost function can be defined in 
two, subtly different, ways, depending on whether the cost is 
upperbounded by f3 or exactly j3. In the first case, which is 
most common in classical information theory iRTTl Ch. 7], 1621 , 
||4] Ch. 9], (63] Sec. 3.3], the information channel capacity is 
defined as 

CCS) ^ sup I(X;Y), (1) 

fxeQ(/3) 

where Cl(f3) is the set of all distributions fx over R" such 
that b(X) < f3. In the second case, which is prevalent in 
optical information theory ED, EH), ED, p. 355] and 
also sometimes used in wireless communications fl4l . the 
information channel capacity is 

C{p)± sup I(X;Y), (2) 

where f2(/3) i s the set of all distributions fx over R™ such 
that b(X) = /3. 

In this paper, information channel capacity refers to an 
exact cost constraint, i.e., C(/3) in (|2), and similarly for the 
operational channel capacity. This is partly because the work 
was inspired by capacity results in optical communications, 
where this is the conventional definition, and partly because 
the fundamental question considered in this paper, about the 
monotonic behavior of channel capacity, is trivial in terms of 
C(J3) J29), ETJ Ch. 2]. That C(f3) is nondecreasing for all 
channels follows from (Q) and the fact that f)(/3) 2 
for all > f3'. However, the two definitions are in fact 
equivalent for static point-to-point channels, as will be shown 
in Theorem |2] 

If the optimization of I(X; Y) is instead done over a subset 
of 0(/3) (or a constrained capacity is obtained. Many 

versions of constrained capacity have been studied in the past, 
such as confining X to a certain range or to a certain discrete 
constellation. 

To summarize the terminology used in this paper, we 
will use "achievable rate" or "mutual information" when 
no optimization is carried out, "constrained capacity" when 
the optimization is over some, but not all, possible source 
distributions, and "channel capacity" when the optimization 
is over all possible source distributions. Thus, the mutual 
information between input and output is a property of the 
channel and the source, the constrained capacity is a property 
of the channel and the source constraints, and the channel 



capacity is a property of the channel alone. We avoid using 
the single word "capacity" in this paper, unless the type of 
capacity is clear from the context. 

IV. Point-to-Point Channels 

In this section, we consider point-to-point vector chan- 
nels, i.e., single-user channels, as defined in the previous 
section. The analysis will be generalized to some instances 
of interference channels in the next section. Furthermore, 
the channel is assumed to be discrete-time and memoryless, 
without any essential loss of generality. The channel model can 
represent a continuous-time bandlimited channel by sampling 
the transmitted and received waveforms at the Nyquist rate 
0] Sec. 23], and it can approximate channels with memory 
by choosing a large enough dimension n 11301 . 

A. Static Channel Models 

As mentioned in Sec. [II] we distinguish between static and 
dynamic vector channel models. Examples of static models in 
optical channel modeling include ETJ, E2)-||34), Sec. 6.5]. 
Static models are closer to the physical reality, in the sense that 
the channel output in experiments and installed transmission 
systems depends on what was actually transmitted over it (X), 
not on what might have been transmitted (fx)- In other words, 
if two source distributions fx x and fx 2 happen to generate 
the same source vector X, then the physical channel output Y 
should follow the same distribution in both cases. A drawback 
with static vector models is that a large dimension n may be 
required in order to accurately capture dispersion and other 
memory effects. 

The main result for point-to-point channels is the following 
theorem, which implies that the channel capacity will either 
increase indefinitely or converge to a finite value as the cost 
increases, depending on the channel. However, it cannot have 
a peak for any channel or any cost. 

Theorem 1 (Law of Monotonic Channel Capacity): C(f3) 
is a nondecreasing function of /? for any static point-to-point 
channel. 

In the interest of saving space, we will not present an 
explicit proof at this point. However, the point-to-point channel 
can be regarded as a special case of an interference channel, 
to be analyzed in Sec. [V] A formal proof of Theorem [TJ is 
obtained by setting K = 1 in Theorem |U Corollary [6] or 
Corollary [7] Alternatively, the theorem can be proved using 
the convexity of the capacity-cost function ||3T1 Sec. 2.1] or 
the lower semicontinuity of relative entropy [|64l Sec. 1.4]. 
Intuitively, a source distribution with a nondecreasing mutual 
information can be constructed as a satellite distribution |23l . 
11251 . where the source vector X = x has a moderate cost b (x) 
with a high probability, regardless of how large the average 
cost b(X) is. As b(X) increases, the distribution changes 
such that a low-probability portion thereof, a "satellite," moves 
towards higher costs, whereas the main part of the distribution 
remains essentially unchanged. 

An immediate consequence of Theorem [TJ given by the 
next theorem, is that the cost-limited channel capacity C(f3) 
is achieved by a source distribution fx for which the cost 
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equals the maximum allowed value /3. This means that the 
two definitions ([l} and (O are equivalent. 

Theorem 2: For any static point-to-point channel and any 
P, C{p) = C{p). 

Proof: The definition (fTJ can be written as 

C{fi) - sup C(p'), 
which by Theorem [T] is equal to C{0). □ 

B. Dynamic Channel Models 

To simulate a high-dimensional vector model is usually 
computationally intractable, and it is often infeasible to per- 
form any kind of numerical optimization based thereon. How- 
ever, a static vector model can under some circumstances be 
approximated by a dynamic scalar model, with a much lower 
complexity. This approach has become quite popular in optical 
channel modeling. For example, if the vector model includes 
averaging random realizations of the interference over a large 
number of samples in time (dispersion) l24l . PP . l42l . l44l 
or frequency (WDM channels) [29), then the central limit 
theorem predicts that the total interference contribution can 
be replaced by a Gaussian random variable, whose variance 
depends not on the samples themselves but on their variance — 
i.e., on the distribution from which these samples were gener- 
ated. The distinction is subtle and often numerically negligible 
in uncoded systems, but it is significant from an information 
theory perspective. Mathematically, a dynamic channel model 
is represented by a family of conditional distributions fy\x> 
one for each cost j3, in contrast to static models for which 
fy\x i s fixed regardless of fx- 

For the purpose of this paper, it suffices to say that no 
analogy to Theorem Q] exists for dynamic channel models. 
There are dynamic point-to-point models for which the chan- 
nel capacity increases monotonically with cost (trivially, since 
static models is a special case), and others for which which 
it does not. Dynamic channel models for which the channel 
capacity increases towards a peak at a finite cost after which it 
decreases towards zero are common (e.g., |45], |46|), and this 
behavior will be exemplified in Sec. IVIII It is even possible 
to construct dynamic channel models for which the capacity- 
cost function is more exotic, such as monotonically decreasing, 
oscillatory, or discontinuous, although such models would 
hardly represent any physical reality. 

To summarize this section, the channel capacity behaves 
fundamentally different for static and dynamic channel mod- 
els. Hence, the channel capacity calculated for a dynamic 
model may not necessarily predict the performance of the 
underlying static model well, nor that of the physical link. 
The accuracy of dynamic channel models, compared with 
their underlying static models, is an intricate subject, which 
will be investigated in a future paper. In short, even if a 
dynamic channel model predicts the uncoded performance 
very accurately, it should be used with caution in connection 
with channel capacity. 



V. Interference Channels 

We consider a discrete, memoryless interference channel 
with K users, each with the purpose of transmitting a message 
from a transmitter to a receiver l63l Ch. 6], for example an 
optical WDM system. Each transmitter i = 1 , . . . , K encodes 
a message into a sequence of vectors Xj, which is transmitted 
and received as another sequence Y \. The ith receiver attempts 
to recover X 2 ; based on Yi, without knowledge of Yj for 
j i. The statistics of the received vectors is given by the 
conditional distribution Jyi,....y K \x 1 ,....x K - Independent data 
is transmitted by each user, and the joint source distribution 
fx lt ...,x K is therefore equal to the product of the marginal 
distributions /xj ■■ - /x K - All source distribution fx t are 
known to all users. From the viewpoint of user i, all interfering 
source vectors Xj for j ^ i are assumed to be independent 
between channel uses. This assumption, which is conventional 
in optical communications, is valid if the codebook of user j 
is not known to user i, or if user j transmits uncoded data. 

We consider four scenarios in the following subsections. 
The first three correspond to selfish optimization by each 
user individually, which is presently the dominant optimization 
approach in optical communications research. The aim is 
to determine the maximum achievable rate of the primary 
user, from X\ to Y\, while treating the signals from the 
other users X2, ■ ■ ■ , Xk as (nonlinear) noise. The received 
vectors Y2 , ■ • ■ , Y k are unknown and the channel can be 
represented by the conditional distribution fY 1 \x 1 ,...,x K - 
These three scenarios differ in the assumptions made on 
fx 2 > ■ ■ ■ , fx K - Even though the sources are statistically inde- 
pendent, their distributions will change as fx x is modified. 
The fourth and last scenario represents joint optimization 
of fxii ■ ■ • j fx K < considering the full interference channel 
model f Yl ,....Y K \x 1 ,...,x K - 

The following lemma about conditional mutual information 
will be useful in Sec. IV-CI 

Lemma 3: For any X and Y, and any discrete Z, 

\I(X;Y)-I(X;Y\Z)\<H(Z). 

Proof: By the chain rule for mutual information, 

I(X: Y, Z) = I(X; Y) + I(X; Z\Y), 
I{X- Y, Z) = I(X; Z) + I(X; Y\Z). 

Eliminating I(X; Y, Z) and rearranging terms, 

| J(X; Y) - I(X: Y\Z)\ = |J(X; Z) - I(X; Z\Y)\. 

Since Z is discrete by assumption, the right-hand side can be 
upperbounded using 

< I(X;Z) < H{Z), 

< I(X; Z\Y) < H(Z\Y) < H(Z), 

which completes the proof. □ 

A. Fixed Interference Distributions 

Suppose that the distributions fx 2 , • ■ • , fx K are fixed and 
do not change even if fx x would change. From the viewpoint 
of the primary user, the interference caused by the other users 
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can be included in the channel model. Mathematically, the 
conditional distribution of the first transmitter-receiver pair 
can be expressed as 

fY 1 \X 1 (Vl\ x l) = 

IE [f Yl \x 1 ,x 2: ....x K (yi\xi,x 2 ,. . . ,x K )] , (3) 

where the expectation is taken over all interferers X 2 = 
x 2 , • • • , X k = Xk- This conditional distribution f Yl \x t does 
not depend on fx 1 and can therefore be regarded as a static 
single-user channel. Hence, Theorem [TJ applies and the chan- 
nel capacity of this interference channel is a nondecreasing 
function of the cost. 

B. Adaptive Interference Cost 

The most common approach to channel capacity analysis 
of WDM systems in the optical communications literature is 
to maximize the achievable rate of the primary user, while 
treating the signals from the other users as noise. In addition, 
all users are usually assumed to transmit with the same cost, 
or even with the same distributions, which implies that the 
interference experienced by the primary user may adapt to 
this user's transmission scheme. Such adaptive interference is 
considered in this section and the next. 

In this section, the interferer's distributions are fixed apart 
from a scaling factor. This would happen in heterogeneous 
multiuser systems, where the primary user applies adaptive 
modulation in order to maximize the achievable rate, whereas 
the other users apply a fixed modulation format with adaptive 
power. We do not claim that this is a very realistic scenario 
in practice, but we mention it for theoretical completeness 
and compatibility with existing literature (see below). The 
scenario has the curious consequence that the obtained channel 
capacities are not necessarily monotonic, in contrast to the 
scenarios in Sec. IV^Al IV^Cl and N-D\ 

Suppose that a number of channels interfere with each other 
in a multiplicative way, so that the interference experienced 
by one channel is proportional to the product of several 
interfering signals. If all signals in such a system are rescaled 
proportionally, the interference power will grow faster than the 
signal power, and, in the absence of interference cancellation, 
the system performance will degrade. This is, with a somewhat 
simplified view, how the nonlinear interference behaves in an 
optical WDM system J24), [38)-[|42), IE). For many common 
modulation formats, the error probability of such a system 
reaches a minimum value at a certain transmitted power, and 
increases towards 1 if the power is further increased 11241 . Sim- 
ilarly, the mutual information decreases towards zero at high 
enough power l43l . |47l . The same nonmonotonic behavior 
is seen even if the primary source distribution is optimized, 
provided that the other distributions are not (heterogeneous 
multiuser system). 

If the interfering distributions are all determined by the 
primary source distribution, as in this scenario, the interference 
channel model is effectively converted into a point-to-point 
channel, which is dynamic, since the random interference 
depends on the source distribution. In its simplest form, this 



dynamic point-to-point channel is an additive noise channel 

Y = X + Z, (4) 

where X and Y are the transmitted and received symbol, 
resp., on the primary channel. The vector Z represents the 
system noise as well as the combined interference from 
several users. It is statistically independent of X, but its 
statistics depend on fx- In most of the literature cited above, 
Z is assumed to follow a Gaussian distribution. This is a 
reasonable assumption if the interference consists of many 
random, independent components, whose amplitudes do not 
vary too much, which is the case, e.g., if the interfering users 
apply conventional modulation formats such as quadrature- 
amplitude modulation or phase-shift keying RD . P4l . The 
advantage of this model is its simplicity. In some cases it 
even admits an exact calculation of channel capacity, as will 
be exemplified in Sec. IVIII 

The assumption of moderate interference amplitude vari- 
ations is important, as it implies that the assumption of Z 
being Gaussian is not valid for all interferer distributions. 
For instance, if all users would apply capacity-achieving 
distributions, then Z is not Gaussian, and the outcome changes 
entirely. This case will be analyzed in Sec. IV-Cl and IV-D1 

C. Adaptive Interference Distributions 

In this section, we consider the scenario where all users 
transmit independent data drawn from the same distribution, 
or linearly rescaled versions thereof. If the primary user's 
distribution is fxi> then the other distributions are 

fxi (x) = Oif Xl {oix), i = 2,...,K 

for some given constants a 2 , ■ ■ ■ , ai<- An important special 
case is a 2 = • ■ ■ = (Xk = 1, which makes all distributions 
fx 1 ,fx 2 , ■ ■ -Jx K identical. 
Theorem 4: If 

fx i {x) = a 1 ?fxMiv), i = 2,...,K (5) 

for some given constants a%, . . . ,atfc> then the channel ca- 
pacity is a nondecreasing function of j3\ = b(X%), for any 
interference channel f Yl \x 1 .x 2 ,...,x K {yi\xi,x 2 , . . .,x K ). 

Proof: First, we verify that the channel coding theorem 
holds in the considered scenario. For any fixed primary user 
distribution fxi> the interferers' distributions fx 2 > ■ ■ ■ : fx K 
are also fixed, and the primary transmitter-receiver pair is 
characterized by the joint distribution 

fx 1 ,Y 1 ( x i^yi) = fxA x i) 

■ E [f Y i\Xi,x 2 ,...,x K (yi\xi,x 2 , ■ ■ -,x K )] , 

where the expectation is taken over all interferers X 2 = 
x 2 , . . . ,Xk = xk- By the single-user channel coding the- 
orem UJ Sec. 23], ED Ch. 7], J63] Ch. 3], the mutual 
information I(Xi,Yi) of this joint distribution corresponds 
to an achievable rate of the primary channel. Hence, all rates 
below the primary channel capacity 

C{p)± sup J(Xi;yi) (6) 
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Fig. 1. An interference channel with time-sharing sources, analyzed in 
Sec. IV-Cl The primary channel X± -> Vi is affected by interference from 
the other sources X2, ■ ■ ■ , X k ■ 



are achievable and the channel coding theorem holds. 

To prove that C(/3) is nondecreasing, let fx' G ri(/3') be 
a capacity-achieving distribution at any cost (3'. We will show 
that C{(5) > C(P') for any (3 > (3'. 

For any given /?>/?' and < e < 1, let 



/?" 4 p< + 



and let /x" be any distribution over R n with cost b(X") = 
ft". We now define a time-sharing random vector Xi given 
an auxiliary binary random variable Qi such that 



|4 Qi = i, 



(7) 



where Pr{Qi = 1} = e. The cost of X\ is 

6(jr 1 ) = (l-e)6(X' 1 ) + e 6(Xi') 
= (l-e)/3'+e/3" 
= /3. 

As illustrated in Fig. Q] the interference can be generated 
by an analogous time-sharing method, using the auxiliary vari- 
ables Q2, ■ ■ ■ , Qk- These variables have the same distribution 
as Qi and are independent of each other and also of Q±. 
They control the interferers X 2 , , . , ,X k such that Xj = X\ 
if Q t = and J ! = X" if Q % = 1, where 



(8) 
(9) 



for i = 2, . , . , K. Obviously, the time-sharing vector X i has 
the desired distribution (|5). 

The mutual information of the primary channel can be 
bounded as 



I{X 1 ,Y l )>I{X 1 ;Y 1 \Q l ) 

>I(X 1 ;Y 1 \Q 1 ,Q 2 ,.. 
-H(Q 2 ,...,Q K ) 



,Qk) 



(10) 



(11) 



where ( fTQb holds because Qi ^ Xi ^ Y"i is a Markov 
chain and ( fTTT i follows from Lemma |3] The first term of the 



right-hand side of (flTT i can be bounded as 

I(X 1 ;Y 1 \Q U Q 2 ,...,Q K ) 

= Pr {<3i = Qi,---,Qk = qk} 

(q U --;qK)€{0S} K 

■ I(X 1 ;Y 1 \Q 1 = q 1} . . . ,Q K = qx) 
> Pr{Qi = • • • = Qk = 0} 

■I(X 1 ;Y 1 \Q 1 =.-.=Q K = 0) 
= (l-e) N I(X' 1 ;Y[) 

= {l-e) N C{P'). (12) 
The second term of the right-hand side of ( fTTT ) is 

K 

H(Q 2 ,...,Q K ) = Y / H(Q i ) 



i=2 

(K- 



l)^2(e), 



(13) 



where H 2 {p) = — p\og 2 p— (1 — p)log 2 (l — p))- Combining 
©, O, and (O yields 

C(fi)= sup /(XijYi) 

/ Xl en(/3) 



> sup [(1 

0<c<l 



'C(P')-(K-l)H 2 (e)] 



= Yw[(l-e) K C(P')-(K-l)H 2 (e)] 
= C(p'), 

which completes the proof. □ 
The time-sharing vector X\ in (|7]i provides an example of 
a satellite distribution ll23l . where X'[, the "satellite," carries a 
much higher cost than X\ and occurs with lower probability. 
We observe that the proof of Theorem [4] does not depend 
on the exact distributions fx±> fx 2 > ■••> fx K < on ly on their 
common time-sharing parameter e. Therefore the condition (O 
on fx 2 1 ■ ■ ■ j fx K being rescaled versions of fx lt an d the cor- 
responding relations (O-©, can be relaxed. A more general 
theorem can be formulated that holds for any set of (possibly 
different) satellite distributions /xu/x 2 ) • • • 1 fx K , as long 
as the time-sharing parameters e 2 ,---,£jf of the interferers 
all tend to when the primary distribution's parameter e\ 
approaches zero. 

As a special case, Theorem [4] applies also to the point-to- 
point channel, by setting K = 1. As already mentioned in 
Sec. IIVI this proves Theorem Q] 

D. Joint Optimization 

In the fourth and last scenario, we assume that the system 
includes a mechanism to optimize the transmission schemes of 
all users jointly, for example via a central network controller. 

Let Ri be an achievable rate for the transmitter-receiver 
pair i = 1, . . . , K and let R = (R±, . . . , Rk) be a vector of 
rates that can be simultaneously achieved over the interference 
channel, with arbitrarily small error probability. The capacity 
region ^(/3), where /3 = (Pi, . . . , (3k), is defined as the 
closure of the set of all achievable rate vectors R such that 
fxt G 0(A) for i = 1, • • . , K 163] pp. 82, 132]. Because no 
analytical expression is known for the the capacity region of 
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general interference channels |63l Ch. 6], we base the analysis 
of this scenario on the operational capacity (see Sec. ITlTb and 
the time-sharing principle, rather than on mutual information 
expressions. 

Theorem 5: Let j3 = (fix, . . . , p K ) and = (f3[ , . . . , 0' K ) 
be two cost vectors such that /3, > P[ for i = 1, . . . , K. Then 
their capacity regions satisfy f <f (/3) 2 )■ 
Proof: Let, for any < e < 1, 

0> A + 

e 

Let f?' and i?" be achievable rate vectors at costs and 0', 
resp. By time sharing p. 534], J63] p. 85], the rate 

(l-e)R' + eR" > (1 - e)R' 

is achievable at cost 

(1 - e)0 + e0' = p. 

The capacity region ^(/3) thus includes all rate vectors of 
the form (1 — e)R', where R' is achievable at cost and 
e is an arbitrarily small positive number. Since the capacity 
region by definition is the closure of all achievable rate vectors 
1631 p. 82], <T(/3) also includes lim £ ^ (l - e)R' = R . In 
conclusion, R' g ^{0) for all R' £ tf(0), which implies 

¥(P) 2 &(0). □ 
The capacity region is a A-dimensional object, and it varies 

as a function of the A' -dimensional vector R. The following 

two corollaries elucidate how the achievable rates vary along 

certain cross-sections of this object. 

Corollary 6: If the cost is varied along a line as 

(3 = 0+ (j, A, 

where all components of (3 Q and A are nonnegative, then all 
achievable rates Ri , . . . , Rk are nondecreasing functions of 
H > 0, and the achievable sum rate Ri + ■ ■ ■ + Rk is also a 
nondecreasing function of fj, > 0. 

Corollary 7: If all transmitters obey the same cost con- 
straint Pi = ■ ■ ■ = (3k = P, then all achievable rates 
i?i , . . . , Rk are nondecreasing functions of p. 

VI. Numerical Example I: Point-to-Point Channel 

In this section, examples will be presented for mutual 
information, constrained capacity, and channel capacity as 
functions of the average transmitted power, where the mutual 
information and constrained capacity have peaks but the 
channel capacity, as predicted by TheoremQ] is nondecreasing. 

A. A Nonlinear Channel 

We consider a very simple channel with nonlinear distortion 
and noise, represented as 

Y = a(X)+Z, (14) 

where X and Y are the real, scalar input and output of the 
channel, resp., a( ) is a deterministic, scalar function and Z 
is white Gaussian noise with zero mean and variance a\. For 
a given channel input x, the statistics of the channel output 



a(x) 
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Fig. 2. A simple example of nonlinear distortion, given by (TTJ for a max = 
10. The channel is essentially linear for small |x| and binary for large \x\. 



is represented by the conditional probability density function 
(pelf) 

where fa{x) = (l/\/2n) exp(— x 2 /2) is the zero-mean, unit- 
variance Gaussian pdf. Since fY\x{y\x) is Gaussian for a 
given x, the conditional entropy is JT] Sec. 20], ||4] Sec. 8.1] 

/oo 
fy\x{y\x) log 2 f Y \x{y\x)dy 
-OO 

1 2 
= 7j lo S2 27TCCr z 

independently of x and hence 

h(Y\X) = ±]og a 2*eo%. (16) 

For a given source distribution fx, the output distribu- 
tion fy is obtained by marginalizing the joint distribution 
fx,y(x,y) = fx(x)fY\x(y\ x ), an d me mutual information 
is calculated as I(X; Y) = h(Y) - h{Y\X). 

As an example, we select a(x) in ( TT4] > as a smooth clipping 
function 

a(x) = a max tanh ( — — J , (17) 

where a max > sets an upper bound on the output. This 
channel is chosen for its simplicity, not for its resemblance to 
any particular physical system. If the instantaneous channel 
input X has a sufficiently high magnitude, the channel is 
essentially binary. For X close to zero, on the other hand, 
the channel approaches a linear AWGN channel. 

The channel parameters are et max = 10 and az = 1 
throughout this section. The function a(x) in ( fTTI i, which 
represents the nonlinear part of the channel ( TBi i, is shown 
in Fig. |2] 

B. Mutual Information 

The mutual information I(X; Y) is evaluated by numerical 
integration, as a function of the transmitted power P. No 
optimization over source distributions is carried out. The 
source distribution fx( x ) is constructed from a given unit- 
power distribution g(x), rescaled to the desired power P 
as fx{ x ) = oig(ax), where a = 1/y/P. The results are 
presented in Fig. [3] for three continuous source pdfs fx{x): 



X 
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Fig. 3. Mutual information for the nonlinear channel in < I4i with a max = 10 
and uz = 1, when the source pdf is Gaussian, uniform, and exponential. The 
AWGN channel capacity is included for reference. 



Fig. 4. Mutual information for the same channel, when the source follows 
various discrete distributions. The source probabilities are uniform. 



zero-mean Gaussian, zero-mean uniform, and single-sided 
exponential, defined as, respectively, 

-V / 3P< x < V3P, 




2v / 3P ' 

0, elsewhere, 



x < 0. 

At asymptotically low power P, the channel is effectively 
an AWGN channel. In this case, the mutual information is 
governed by the mean value of the source distribution, ac- 
cording to l65l . All zero-mean sources achieve approximately 
the same mutual information, which approaches the AWGN 
channel capacity. The asymptotic mutual information for the 
exponential distribution, whose mean is ■J P/2, is half that 
achieved by zero-mean distributions. 

The mutual information curves for all three source pdfs 
reach a peak around P = 100, when a large portion of 
the source samples still fall in the linear regime of the 
channel. When the average power P is further increased, the 
mutual information decreases towards a value slightly less 
than 1 asymptotically for the zero-mean sources and for 
the exponential source. The asymptotes are explained by the 
fact that at high enough power, almost all source samples fall 
in the nonlinear regime, where the channel behaves as a 1-bit 
noisy quantizer. 

Similar results for various discrete source distributions are 
shown in Fig. @] The studied one-dimensional constellations 
are on-off keying (OOK), binary phase-shift keying (BPSK), 
and M-ary pulse amplitude modulation (M-PAM). The con- 
stellation points are equally spaced and the source samples X 
are chosen uniformly from these constellations. The mutual 
information for il/-PAM constellations with M > 4 exhibits 
the same kind of peak as the continuous distributions in Fig. [3j 
indeed, a uniform distribution over equally spaced M-PAM 
approaches the continuous uniform distribution as M —> oo. 



Similarly to the continuous case, the zero-mean discrete 
sources approach the AWGN channel capacity as P — > 0. Half 
this channel capacity is achieved by the OOK source, which 
has the same mean value \JPj2 as the exponential source 
above. The asymptotics when P — > oo depends on whether M 
is even or odd. For any even M, the channel again acts like a 1- 
bit quantizer and the asymptotic mutual information is slightly 
less than 1 . For odd M, however, here exemplified by 3-PAM, 
there is a nonzero probability mass at X = 0, which means 
that the possible outputs are not only Y = ±a max + Z but 
also Y = 0+Z. Hence, the channel asymptotically approaches 
a ternary-output noisy channel, whose mutual information is 
upperbounded by log 2 3 = 1.58. 

To summarize, this particular channel has the property that 
the mutual information for any source distribution approaches 
a limit as P — s- oo, and this limit is upperbounded by 
log 2 3. It might seem tempting to conclude that the channel 
capacity, which is the supremum of all mutual information 
curves, would behave similarly. However, as we shall see in 
Sec. IVI-DI this conclusion is not correct, because the limit 
of a supremum is in general not equal to the supremum 
of a limit. Specifically, the asymptotical channel capacity is 
linip^oo C(P) = limp-j.oo sup g I(X; Y), which is not equal 
to sup g limp^oo I(X; Y) < log 2 3. 

C. Constrained Capacity 

The standard method to calculate the channel capacity 
of a discrete memoryless channel is by the Blahut-Arimoto 
algorithm JU Sec. 10.8], J66l Ch. 9]. It has been extended 
to continuous-input, continuous-output channels in ll67l . l68l . 
Our approach is most similar to ll68l , in which distributions 
are represented by lists of samples, so-called particles. We 
consider a source distribution of the form 



N 



fx(x) = ^WjS(x ~ Ci), 



(18) 



where S(-) is the Dirac delta function, N is the number 
of particles, c = (ci,...,cat) are the particles, and w = 
(wi, . . . , wn) are the probabilities, or weights, associated with 
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P 



Fig. 5. Constrained capacities for the same channel, where the source is 
constrained to a given constellation but the source probabilities are optimally 
chosen for each P. The colored curves indicate the nonoptimized mutual 
informations from Fig. |4] (uniform probabilities). 



Fig. 6. Channel capacity for the same channel. All mutual information 
and constrained capacity curves from Figs. [5J(5] are included for reference 
(colored). Even though most mutual information and constrained capacity 
curves decrease, the channel capacity does not. The three markers refer to 
distributions in Fig. [7] 



each particle. If N is large enough, any distribution can be 
represented in the form ( f]~8T > with arbitrarily small error. With 
this representation, 

fy(y) = f:^fc( y -^-), 

^ <Jz V J 

which yields h(Y), and thereby I(X; Y), by numerical inte- 
gration. 

The objective for the optimization is to maximize the 
Lagrangian function 

L{c, w, Ai, A a ) = h(Y) + Ai f ^ w, - lj 

+ a 2 (j2 w ^- pS] j > 

where the Lagrange multipliers Ai and A2 are determined to 
maintain the constraints Yli w i = 1 an< ^ J2i w i c i = P during 
the optimization process. The gradients of L with respect to 
c and w are calculated, and a steepest descent algorithm (or 
more accurately, "steepest ascent") is applied to maximize L. 
In each iteration, a step is taken in the direction of either of the 
two gradients^ The step size is determined using the golden 
section method |69l pp. 271-273]. Constrained capacities were 
obtained by including additional constraints on c and/or w. 
Several initial values (c, w) were tried, and N was increased 
until convergence. 

The topography of L as a function of c and w turned out to 
include vast flat fields, where a small step has little influence 
on L. This made the optimization numerically challenging. No 
suboptimal local maxima were found for the studied channel 
and constraints, although for nonlinear channels in general, 

'Moving in the direction of the joint gradient turned out to be less efficient, 
because for small and large P, the numerical values of c and w are not of 
the same order of magnitude. 



the mutual information as a function of the source distribution 
may have multiple maxima@ 

Using this optimization technique, some constrained capac- 
ities are computed. Specifically, we investigate how much 
the mutual information curves in Fig. [4] can be improved 
if the source samples X are chosen from the constellation 
points c with unequal probabilities w, so-called probabilistic 
shaping. The constellations are the same as before, equally 
spaced OOK, BPSK, and M-PAM, but the probabilities of 
each constellation point is allowed to vary. For each power 
P, the mutual information is maximized over all probabilities. 
The constellation is scaled to meet the power requirement but 
otherwise not changed. 

The results are shown in Fig. [5] for the same channel 
as before ( (fl4t with (T7\ and parameters a max = 10 and 
<rz = 1)- The BPSK performance offers no improvement over 
the mutual information of uniform BPSK in Fig. [3] because 
equal probabilities turn out to be optimal for all P. However, 
the constrained capacity of OOK with optimal probabilistic 
shaping is about twice the mutual information of uniform OOK 
at low P. The improvements for 3- and 4-PAM are marginal, 
whereas the performance of 8-PAM is significantly improved 
for medium to high power, and its peak increases from 2.28 to 
2.37 bits/symbol. The general trends, however, are the same as 
for the mutual information in Fig. |U The constrained capacity 
for any probabilistically shaped AI-PAM system with M > 4 
displays a prominent peak around P = 100, after which 
the constrained capacity decreases again towards the same 
asymptotes as in the uniform case. 

Obviously, there exist many other types of source con- 
straints. Some of these have constrained capacities similar 
to those of the probabilistically shaped discrete constellations 
shown in Fig. [5J with a peak at a finite power and a relatively 

2 An exception occurs when the constellation points c are fixed and the 
only constraint is u)j = 1. In this special case, the mutual information is 
a concave function of w for any channel pp. 33, 191] and there is thus a 
unique maximum. 
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Fig. 7. Almost capacity-achieving source distributions for P = 10, 100, and 1000. 



weak asymptotic performance, but other classes of sources can 
be conceived that are better suited to this nonlinear channel at 
high transmitted power. However, instead of designing further 
constrained sources, we will now proceed to study the channel 
capacity, which is the main concern of this paper. 

D. Channel Capacity 

By optimizing the mutual information over unconstrained 
source distributions f2(P), according to the method outlined 
in Sec. lVI-Cl we obtain the channel capacity (f2]). As mentioned 
in Sec. [Ill] the channel capacity is a property of the channel 
alone, not the source, so there exists just one channel capacity 
curve for a given channel. 

This channel capacity is shown in Fig. [6] for the studied 
channel. As promised by the Law of monotonic channel 
capacity (Theorem [TJ, the curve does not have any peak at 
a finite P, which characterizes most mutual information and 
constrained capacity curves. The channel capacity follows 
the mutual information of the Gaussian distribution closely 
until around P = 100. However, while the Gaussian case 
attains its maximum mutual information I(X;Y) — 2.44 
bits/symbol at P = 130 and then begins to decrease, the 
channel capacity continues to increase towards its asymptote 
limp^oo C{P) = 2.54 bits/symbol. 

This asymptotical channel capacity can be explained as 
follows. Define the random variable A = a(X). Since a(-) is a 
continuous, strictly increasing function, there is a one-to-one 
mapping between X € (—00,00) and A € (— a max , a-max)- 
Thus I(X; Y) = I (A; Y), where Y = A + Z. This represents 
a standard discrete-time AWGN channel whose input A is 
subject to a peak power constraint. The constrained capacity 
of a peak-power-limited AWGN channel was bounded already 
in HI Sec. 25] and computed numerically in l60l . where 
it was also shown that the capacity-achieving distribution is 
discrete. The asymptote in Fig. [6] which is 2.54 bits/symbol 
or, equivalently, 1.76 nats/symbol, agrees perfectly with the 
constrained capacity in ll60l Fig. 2] for a max /crz = 10. 

Some of the (almost) capacity-achieving source distributions 
are shown in Fig. UJ numerically optimized as described in 
Sec. lVI-Cl As mentioned, the topography of L as a function of 
the source parameters for a given P includes vast, almost fiat, 
fields, where many source distributions yield the same mutual 
information, within a numerical precision of 2-3 decimals. 



For P = 10, the optimized discrete source is essentially a 
nonuniformly sampled Gaussian pdf, and the obtained channel 
capacity, 1.61, has the same value as the mutual information of 
a continuous Gaussian pdf, shown in Fig. [3] For P = 100 and 
1000, the distribution is more uniform in the range where the 
channel behaves more or less linearly, which for this channel 
is approximately at — a max /2 < x < a max /2, with some 
high-power outliers in the nonlinear range |x| > a max . In 
all cases, increasing the number of particles TV from what 
is shown in Fig. [7j does not increase the mutual information 
significantly, from which we infer that these discrete sources 
perform practically as well as the best discrete or continuous 
sources for this channel. 

Although the capacity-achieving distributions would look 
quite different for other types of nonlinear channels, a general 
observation can be made from Fig. [7j Even at high average 
power, the source should generate samples with moderate 
power, for which the channel is good, most of the time. The 
high average power is achieved by a single particle having a 
very large power; thus, the capacity-achieving distribution is 
a satellite distribution ll23l . 

VII. Numerical Example II: Interference Channel 

This section demonstrates two simple examples of the 
scenario studied in Sec. IV-BI where the interference has a 
fixed distribution, which is rescaled depending on the primary 
source distribution. We shall see that this type of interference 
model can lead to simple and exact expressions for the 
channel capacity, which partly explains their popularity, as 
such expressions are rare in information theory. However, as 
discussed in Sec. IV-BI the applicability of this model is limited 
to heterogeneous multiuser systems, where the primary user 
applies capacity-achieving coding and modulation while at 
least one interferer does not. 

We consider the real additive noise channel (01). The noise 
vector Z is modeled as white Gaussian noise, which is 
independent of X, but its power Pz = K[\\Z\\ 2 ] depends on 
P = K[||X|j 2 ]. We write P z = P Z {P) to make this depen- 
dence explicit. For a fixed source power P, this is a single- 
user AWGN channel with signal-to-noise ratio P/Pz{P). For 
another P, the signal-to-noise ratio is different, but it is still 
an AWGN channel. The capacity-achieving distribution fx is 



Preprint, Sept. 13, 2012 



1 1 




Fig. 8. A simple model of an optical WDM channel with dispersion, noise, Fig. 9. The channel capacity C(P) of another nonlinear interference channel, 

and FWM is given by |4) and i20\ . Its channel capacity C'(P) decreases at defined by {4} and i2i\ , which lowerbounds the channel capacity of an optical 

high power P, which is explained by different requirements on the source channel with combined dispersion, noise, and XPM. 
distributions of the primary user (coded) and the interferers (uncoded). 



zero-mean Gaussian for every P and the channel capacity is 

c =M i+ i^py)- <"» 

The net result of these simplifications is that the inter- 
ference channel model, with coupled transmitted powers, is 
reduced into a dynamic point-to-point channel model, whose 
channel capacity is given by a simple, exact expression. Two 
instances of such model are particularly popular in optical 
communications. Already in 1993, Splett et al. showed that 
the nonlinear interference caused by four- wave mixing (FWM) 
can be modeled as AWGN, whose variance is proportional to 
the cube of the signal power 0381 . under some assumptions on 
the dispersion and the optical amplifier noise. Thus 

Pz{P) =b Q + b 3 P 3 , (20) 

where &o accounts for the amplifier noise and 63 for the non- 
linear distortion. Similar cubic expressions for the nonlinear 
interference in WDM systems due to FWM were discussed 
in ||39l - l43l . The channel capacity ( fT9l with this interference 
model is illustrated in Fig. [8] for bo = 1 and different values 
of 63. At low transmitted power, the channel behaves like 
the single-user AWGN channel. As the power increases, the 
channel capacity reaches a peak, after which it decreases again 
towards zero at high power. The interference gets so strong that 
communication is virtually impossible even with optimized 
coding and modulation. 

The second instance represents another kind of nonlinear 
interference in optical communications, namely, cross-phase 
modulation (XPM). In 2001, Mitra and Stark J48] proposed 
a channel model for WDM systems dominated by XPM, 
again under certain conditions on the dispersion and noise. 
The channel model does not take the form of a conditional 
distribution fy\x between its discrete-time input X and 
output Y, and thus cannot be immediately used for channel 
capacity calculations, but the second moments between X and 
Y were estimated. The channel capacity of the XPM system is 
then lowerbounded by the channel capacity of another channel, 



whose input and output are jointly Gaussian with the same 
second-order moments as the XPM model Pol . 0481 . This 
lower bound is obtained from (@]) by taking the interference 
power to be 

Pz(P) = (Pn + P)e p2 / p « - P, (21) 

where Pjv represents the optical amplifier noise and Pq the 
nonlinear distortion. The capacity of this AWGN channel, 
which thus lowerbounds the channel capacity of the XPM- 
dominated WDM system, is 

C 4 l0g2 ( 1+ {P N + P) P e^rZ-p) 
1 / Pe- p2 / p o \ 

= 2 lQg2 [ 1+ p N+P{l - e -r>/Pg ) )> ^ 

in agreement with [48]. Similar lower bounds were discussed 
in 0, ED, GH-ED, IS Ch. 11]. Its behavior, illustrated 
in Fig. |9j is similar to the behavior of d201 l. but the channel 
capacity decays even faster with increasing power in this case. 

A natural question is how tight d22l is, when interpreted as 
a lower bound on the channel capacity of the XPM system. It 
was conjectured in [48] that the XPM channel capacity would 
have a peak, similarly to its Gaussian lower bound, but as far 
as the author knows, this has not yet been proved or disproved. 

As discussed in Sec. IV-BI the peaky behavior of Figs. [8j-|9]is 
typical for all interference channel models of the type (|4j, with 
fixed interference distributions, if Pz (P) increases faster than 
linearly in P. This behavior, which is fundamentally different 
from the three other interference channel scenarios considered 
in this paper, only occurs for heterogeneous multiuser systems. 

VIII. Summary and Conclusions 

It was proved that the channel capacity is a nondecreasing 
function of a cost (such as transmitted power) in the following 
cases. 

• Single-user channels for which the joint input-output 
distribution fxy is separable into two components, 
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fx,v(x,y) = fx(x)f Y \x(y\x), where the first com- 
ponent depends only on the source and the second only 
on the channel (static channel model). 

• Interference channels where all users, except the one of 
interest, transmit data from fixed source distributions. 

• Interference channels where all users transmit data from 
the same (optimized) distribution. 

• Interference channels where the distributions of all users 
are optimized jointly. 

In contrast, there are numerous examples in the literature 
where the channel capacity has a peak at a certain cost, after 
which it decreases towards zero. These examples all pertain 
to one of the following two cases: 

• Single-user channels where fx.Y(x,y) is not separable 
into one component that depends on the source only 
and another that depends on the channel only (dynamic 
channel model). 

• Interference channels where the transmission scheme of 
one user (the one of interest) is optimized while the 
other users satisfy the same power constraint by pure 
amplification. 

Since the channel capacity behaves fundamentally differ- 
ently in these cases, an important conclusion is that static and 
dynamic channel models should be kept distinct. In particular, 
a static channel model should not be approximated by a 
simpler dynamic model, if the purpose is to compute channel 
capacity (although such approximations may serve other per- 
formance metrics excellently). Furthermore, for WDM systems 
and other interference channels, the assumptions on how the 
users adapt their transmission schemes in response to changing 
traffic from the other users have a decisive impact on the 
results. Future research may take a game-theoretic approach 
to this problem, sensing and adapting transmission in several 
iterations. 

Nonmonotonic capacity-cost functions have, to our best 
knowledge, only been reported in optical communications, 
never in wireless or copper-wired applications, nor in fun- 
damental information theory. We believe that this is more 
due to different modeling traditions than to any fundamental 
technological differences between the fields. 
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